Automatic segmentation of film dialogues into phonemes and graphemes
نویسندگان
چکیده
In film post-production, efficient methods for re-recording a dialogue or dubbing in a new language require a precisely time-aligned text, with individual letters time-coded to video frame resolution. Currently, this time alignment is performed by experts in a painstaking and slow process. To automate this process, we used CRIM’s largevocabulary HMM speech recognizer as a phoneme segmenter and measured its accuracy on typical film extracts in French and English. Our results reveal several characteristics of film dialogues, in addition to noise, that affect segmentation accuracy, such as speaking style or reverberant recordings. Despite these difficulties, an HMM-based segmenter trained on clean speech can still provide more than 89% acceptable phoneme boundaries on typical film extracts. We also propose a method which provides the correspondence between aligned phonemes and graphemes of the text. The method does not use explicit rules, but rather computes an optimal string alignment according to an edit-distance metric. Together, HMM phoneme segmentation and phonemegrapheme correspondence meet the needs of film postproduction for a time-aligned text, and make it possible to automate a large part of the current post-synch process.
منابع مشابه
Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کامل“ A Review : Different methods of segmenting a continuous speech signal into basic units ”
Speech is the medium through which human beings can communicate. Segmentation of speech is required for better speech recognition. Segmentation of speech can be done into basic units like words, phonemes or syllables. The two main methods used for segmentation of speech signals are manual segmentation and automatic segmentation. But manual segmentation is not favoured as it is tedious, time con...
متن کاملHidden Markov models for grapheme to phoneme conversion
We propose a method for determining the canonical phonemic transcription of a word from its orthography using hidden Markov models. In the model, phonemes are the hidden states and graphemes the observations. Apart from one pre-processing step, the model is fully automatic. The paper describes the basic HMM framework and enhancements which use preprocessing, context dependent models and a sylla...
متن کاملDevelopmental dyslexia: a motor-articulatory feedback hypothesis.
Reading is mediated by parallel and widely distributed modular systems. There are, therefore, multiple loci in these systems where dysfunction may lead to developmental dyslexia. However, most normal children learn to read using the alphabetic system. Learning to use this system requires awareness that words are comprised of a system of speech sounds (phonological awareness) and the knowledge o...
متن کاملSpoken Malay Language Influence on Automatic Transcription and Segmentation
The influence of Malay language into modeling a Malay speech lexicon can be potentially useful for a more accurate transcription and segmentation. The problem arises when trying to discriminate the boundaries between similar sounding phonemes for segmentation, especially in dyslexic children‘s speech when reading, which have been influenced by the surrounding phonemes (before and after) thus ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003